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Abstract: Emergence of replicable genetic molecules was one of the marking points in the 
origin of life, evolution of which can be conceptualized as a walk through the space of all possi- 
ble sequences. A theoretical concept of fitness landscape helps to understand evolutionary pro- 
cesses through assigning a value of fitness to each genotype. Then, evolution of a phenotype is 
viewed as a series of consecutive, single-point mutations. Natural selection biases evolution to- 
ward peaks of high fitness and away from valleys of low fitness [1,2], whereas neutral drift oc- 
curs in the sequence space without direction as mutations are introduced at random. Large net- 
works of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long 
genomes, are possible or even inevitable [1,3,4]. Their detection in experiments, however, has 
been elusive. Although a few near-neutral evolutionary pathways have been found [5-7], recent 
experimental evidence indicates landscapes consist of largely isolated islands [8.9]. The generali- 
ty of these results, however, is not clear, as the genome length or the fraction of functional mole- 
cules in the genotypic space might have been insufficient for the emergence of large, neutral 
networks. Thorough investigation on the structure of the fitness landscape is essential to under- 
stand the mechanisms of evolution of early genomes. 


RNA molecules are commonly assumed to play the pivotal role in the origin of genetic sys- 
tems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, 
with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, 
respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of 
our recent studies on the structure of the sequence space of RNA ligase ribozymes selected 
through in vitro evolution. Several hundred thousands of sequences active to a different degree 
were obtained by way of deep sequencing. Analysis of these sequences revealed several large 
clusters defined such that every sequence in a cluster can be reached from any other sequence in 
the same cluster through a series of single point mutations. Sequences in a single cluster appear 
to adopt more than one secondary structure. The mechanism of refolding within a single cluster 
was examined. To shed light on possible evolutionary paths in the space of ribozymes, the con- 
nectivity between clusters was investigated. The effect of length of RNA molecules on the struc- 
ture of the fitness landscape and possible evolutionary paths was examined by way of comparing 
functional sequences of 20 and 80 nucleobases in length. It was found that sequences of different 
lengths shared secondary structure motifs that were presumed responsible for catalytic activity, 
with increasing complexity and global structural rearrangements emerging in longer molecules. 
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Emergence of replicable genetic molecules was one 
of the marking points in the origin of life, evolution of 
which can be conceptualized as a walk through the 
Space of all possible sequences. A _ theoretical 
concept of fitness landscape was proposed by 
Wright [1,2] to help understand the evolution 
process, on which each genotype has a value of 
fitness, and the evolution of a phenotype to reach a 
local fitness peak is viewed through consecutive 
mutations. Natural selection biases evolution toward 
peaks of high fitness and away from valleys of low 
fitness, while neutral drift occurs in the sequence 
Space without direction as mutations are introduced 
at random [3,4]. 


connected peaks 


isolated peak 


- Isolated peaks: no path-ways between peaks 
consisting of consecutive, viable genotypes that 
differ by a single mutation > evolutionary 
optimization possible only through genetic 
recombination or alterations to the landscape 


- Connected peaks: networks of neutral or near- 
neutral mutations > large volumes of genotypic 
space crossed without marked effect on fitness, 
eventually chancing upon a new fitness peak. 


¢ Advantages 
- early genetic and catalytic molecules, 
- abundant biochemical activities: aotamers and 
ribozymes. 


¢ System in this study: ligase ribozymes selected 
through in vitro evolution. 


¢ Determine connectivity in the sequence space for 
small RNA ribozymes. 
¢ Investigate length effects on the activity and 


connectivity to shed light on evolution process with 
increased length of RNA ribozyme. 


¢ In vitro evolution and high throughput sequencing. 


¢ Computational methods of sequencing analysis 
and secondary structure prediction and comparison. 
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In vitro transcription 


6x10'4 DNA templates 
80bp = 10*8 Possible Seq. 
20bp = 10! Possible Seq. 
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¢ Initial DNA library ( >10'* unique sequences) 
¢ High-throughput sequencing 

(> 3 x 10° reads per population) 
¢ Random length variation: 2ON & 80N 
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¢>3 x10° sequence selected 
¢ Distinct secondary structures: 


¢ Secondary structure comparison: 
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Grey: random region; Green: ligation site distance comparison in tree representation 


¢>4x 10° unique sequences analyzed 
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¢ Clusters in Sequence space 
- connected through single 
mutation within a cluster 
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¢ Secondary structure comparison within clusters 


* Similar secondary structure in different clusters 
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- 4 large clusters (83% of all) 
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Secondary structure distance 


¢ Distinct secondary structures within a cluster 
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- Common motif identified 


80N selection 80N selection 


20N selection 


Sequence “ A” (20N) from cluster 1 from cluster 2 
sequence from peak * 


¢ Variation of length for the common motif: 2ON & 80N 
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¢ Increased motif length > Structure complexity 


¢ Enhanced activity in 80N 


¢ Small RNA ligase with varied random region 


selected through in vitro evolution 


¢ Distinct secondary structures connected in 


sequence space of selected RNA 
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